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A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that 
for each required output there exists a reducer that receives all the inputs that participate in the computation 
of this output. Reducers have a capacity, which limits the sets of inputs that they can be assigned. However, 
individual inputs may vary in terms of size. We consider, for the first time, mapping schemas where input sizes are 
part of the considerations and restrictions. One of the significant parameters to optimize in any MapReduce job is 
communication cost between the map and reduce phases. The communication cost can be optimized by minimizing 
the number of copies of inputs sent to the reducers. The communication cost is closely related to the number of 
reducers of constrained capacity that are used to accommodate appropriately the inputs, so that the requirement 
of how the inputs must meet in a reducer is satisfied. In this work, we consider a family of problems where it is 
required that each input meets with each other input in at least one reducer. We also consider a slightly different 
family of problems in which, each input of a list, X, is required to meet each input of another list, Y, in at least one 
reducer. We prove that finding an optimal mapping schema for these families of problems is NP-hard, and present a 
bin-packing-based approximation algorithm for finding a near optimal mapping schema. 

Categories and Subject Descriptors: H.2.4 [Systems]: Parallel Databases; H.2.4 [Systems]: Distributed Databases; 
C.2.4 [Distributed Systems]: Distributed Databases 

General Terms: Design, Algorithms, Performance 

Additional Key Words and Phrases: Distributed computing, mapping schema, MapReduce algorithms, reducer 
capacity, and reducer capacity and communication cost tradeoff 


1. INTRODUCTION 

MapReduce [Dean and Ghemawat 2004j is a programming system used for parallel 
processing of large-scale data. It has two phases, the map phase and the reduce phase. The 
given input data is processed by the map phase that applies a user-defined map function to 
produce intermediate data (of the form {key, value)). Intermediate data is, then, processed 
by the reduce phase that applies a user-defined reduce function to keys and their associated 
values. The final output is pr ovided by the reduce p hase. A detailed description of MapReduce 
can be found in Chapter 2 of [Leskovec et al. 2014||. 
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Communication Cost and Reducer Capacity. An important performance measure for 
MapReduce algorithms is the amount of data transferred from the mappers (the processes 
that implement the map function) to the reducers (the processes that implement the reduce 
function). This is called the communication cost. The minimum communication cost is, of 
course, the size of the desired inputs that provide the final output, since we need to transfer 
all these inputs from the mappers to the reducers at least once. However, we may need to 
transfer the same input to several reducers, thus increasing the communication cost. 

Depending on various factors of our setting, each reducer may process a larger or smaller 
amount of data. The amount of data each reducer processes however affects the wall clock 
time of our algorithms and the degree of parallelization. If we send all data in one reducer, 
then we have low communication (equal to the size of the data) but we have low degree of 
parallelization, and thus the wall clock time increases. Thus, the maximum amount of data a 
reducer can hold is a constraint when we build our algorithm. 

Reducer capacity. We define reducer capacity to be the upper bound on the sum of the sizes of 
the values that are assigned to the reducer. For example, we may choose the reducer capacity 
to be the size of the main memory of the processor on which the reducer runs or we may 
arbitrarily set a low reducer capacity if we want high parallelization. We always assume in 
this paper that all the reducers have an identical capacity, denoted by q. 

There are 
et al. 2010t 


various work s in the field of MapReduce algor i thms design ie.g., |Karl o 
Afrati et al. 201^ [Goodrich 2010 


Ullman 2012 


_ n2\ _ _ _ ^ _ 

Afrati and Ullman 2013)) that investigate problems and/or build algorithms with minimum 
communication cost when the reducer size is bounded by the number of inputs that a reducer 
is allowed to hold. In this paper, we consider for the first time problems where each input 
may have a different size and the reducer capacity is an upper bound on the sum of the sizes 
of the inputs in a reducer. Here, we investigate the problem where each input is required to 
meet in a reducer with any other input. We give now some examples where this problem may 
appear in practice. 


Pietracaprina et al. 2012 


Motivating Examples. We present three examples. 

Example 1.1. Computing common friends. An input is a list of friends. We have such lists 
for m persons. Each pair of lists of friends corresponds to one output, which will show us 
the common friends of the respective persons. Thus, it is mandatory that lists of friends of 
every two persons are compared. Specifically, the problem is: a list F = {/i, / 2 ,..., fm} of m 
friends is given, and each pair of elements (/i, fj) corresponds to one output, common friends 
of persons i and j; see Figure 



m lists of friends 



Fig. 1: Computing common friends Fig. 2: Skew join example for a heavy hitter, 
example. &i. 


Example 1.2. Similarity-join. Similarity-join is an example of the A2A mapping schema 
problem that can be used to find the similarity between any two inputs, e.g., Web pages 
or documents. A set of m inputs (e.g., Web pages) WP = {wpi,wp 2 ,.. •, a similarity 
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function sim{x,y), and a similarity threshold t are given, and each pair of inputs {wpx,wpy) 
corresponds to one output such that sim{wpx, wpy) > t. 

It is necessary to compare all-pairs of inputs when the similarity measure is sufficiently 
complex that shortcuts like locality-sensitive hashing are not available. Therefore, it is 
mandatory that every two inputs (Web pages) of the given input se t ( WP) are compared . 
The similarity-join is useful in various applications, mentioned in jBayardo et al. 2007| , 
e.g., near-duplicate document detection, collaborative filtering, and query refinement for Web 
search. 

Example 1.3. The drug-interaction problem. The drug-interaction problem is given 
in I jUllman 2012| , where a list of inputs consists of 6,500 drugs and a drug i holds information 
about the medical history of patients who had taken the drug i. The objective is to find pairs 
of drugs that had particular side effects. In order to achieve the objective, it is mandatory 
that each pair of drugs is compared. 

Example 1.4. Skew join of two relations X {A, B) and Y{B,C). Thejoin of relations X( A, S) 
and Y{B, C), where the joining attribute is B, provides output tuples (a, b, c), where (a, b) is in 
A and (b, c) is in C. One or both of the relations X and Y may have a large number of tuples 
with an identical il-value. A value of the joining attribute B that occurs many times is known 
as a heavy hitter. In skew join of A(A, B) and Y (B, C), all the tuples of both the relations with 
an identical heavy hitter should appear together to provide the output tuples. 

In Figure 1^ bi is considered as a heavy hitter; hence, it is required that all the tuples of 
X{A,B) andy(B,C) with the heavy hitter, B = bi, should appear together to provide the 
desired output tuples, (a, bi,c) {a € A,bi G B,c G C), which depend on exactly two inputs. 


Problem Statements. We define two problems where exactly two inputs are required for 
computing an output: 

All-to-All problem.. In the all-to-all (A2A) problem, a list of inputs is given, and each pair of 
inputs corresponds to one output. 

X-to-Y problem.. In theX-to-Y (X2Y) problem, two disjoint lists X and Y are given, and each 
pair of elements {xi,yj), where xt G X,yj G Y,\/i,j, of the lists X and Y corresponds to one 
output. 

Computing common friends on a social networking site, and the drug-interaction problem are 
examples of A2A problems. Skew join is an example of a X2Y problem. 

A mapping schema defines a MapReduce algorithm. A mapping schema assigns input to 
reducers, so that no reducer exceeds the reducer capacity and all pairs of inputs (in A2A 
problem) or all pairs of X-to-Y inputs (in X2Y problem) meet in the same reducer^ 

The communication cost, is a significant factor in the performance of a MapReduce 
algorithm. The communication cost comes with a tradeoff in the degree of parallelism, as 
we mentioned. A mapping schema is optimal if there is no other mapping schema with a 
lower communication cost. In this paper, we investigate how to construct optimal mapping 
schemas or good approximations of them. 


Outline of Paper and Our Contribution. In this paper, we investigate the problem of 
finding an optimal or near optimal mapping schema for the case we have inputs of different 
sizes. 


— In Section 2 we warm up to the problem with discussing how the tradeoffs appear. 

— In Section 3 we prove that finding an optimal mapping schema is intractable. 

— In Section]4 we present preliminary results and present one of our techniques to obtain 
near optimm mapping schemas. The technique is to do bin-packing first and collect inputs 
in bins, then treat bins as inputs, possibly all of equal size. 


^For more general problems, we are given the graph which defines which pairs of inputs should meet in the same 
reducer to solve the problem and this is what the mapping schema should achieve - but we do not consider such 
problems here. 
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— In Section]^ we present algorithms to construct optimal mapping schemas in certain cases 
where the inputs are all of equal size. 

— In Sections and we combine bin-packing and algorithmic techniques from Section 
to build algorithms that construct mapping schemas that are good approximations to the 
optimal. For each algorithm, we argue in the end ho w goo d an approximation this is. 

— In Section [7l w e extend the idea presented in Section [K3| for equal size inputs. 

— In Sections and we only considered the case when there is no input of size > | 
(remember we denme with q the reducer capacity). Thus in Section]^ we investigate the 
case where there is an input of size > |. We mainly use similar techniques as in Section® 

— So far we have investigated the A2A problem. In Section we take the X2Y problem to 
provide algorithms for this too. 


Related Work. M apReduce was introduced by Dean a nd Ghemawat in 2004 | |Dean and 
IGhemawat 2004|. Karloff et al. |Karloff et al. 2010| presents a model for comparing 
MapReduce with the Parallel Random Access Machine (PRAM) model and states that a 
large class of PRAM algorithms can be simulated by MapReduce. However, parallel and 
sequential computations (used in MapReduce) differentiate MapReduce and PRAM model. 
Another model considers th e efficiency of M apReduce algorithms in terms of algorithm’s 
running time, suggested in | |Goodrich 2010) . The author simulates PRAM algorithms by 
MapReduce and defines memory-bound for MapReduce algorithms in terms of reducer I/O 

sizes for each round and each reducer. _ 

Following [Karloff et al. 20101 [Goodrich 2010 1, a filtering technique for MapReduce is 
suggested in [Lattanzi et al. 2011|. This technique removes some of nonessential data and 
result s in fewer rounds than in bot h the previous stated models pCarloff et al. 2010[|Goodrich 


|2010|. Essentially, the models, in [Karloff et al. 20 ICi] [Goodrich 20101 |Lattanzi et al. 2011|, 
provide a way t o simulate a large f amily of PRAM algorithms by MapReduce. 

Afrati et al. [Afrati et al. 20l^ presents a model for MapReduce algorithms where an 
output depends on two inputs, and show s a tradeoff between the communication cost and 
parallelism. In (Afrati and Ullman 2013), the authors consider a case where each pair of 
inputs produces an output and present an upper bound that meets the lower bound on the 
communication cos t as a function of the number o f inputs sent to a reducer. However, both in 
[Afrati et al. 2013| and [Afrati and Ullman 2013| the authors regard the reducer capacity in 
terms of the number of inputs (assuming each input is of an identical size) sent to a red ucer. 

Our setting is closely related to the settings given by Afrati et al. [Afi^ati et al. 2013] |, but 
we allow the input sizes to be different. To the best of our knowledge, we for the first time 
do not restrict the input sizes to be identical. Thus, we consider a more realistic settings for 
MapReduce algorithms that can be used in various practical scenarios. 


2. MAPPING SCHEMA AND TRADEOFFS 

Our system setting is an extension of the standard system setting | Afrati et al. 2013) for 
MapReduce algorithms, where we consider, for the first time, inputs of different sizes. In 
this section, we provide formal definitions and some examples to show the tradeoff between 
communication cost and degree of parallelization. 


Mapping Schema. A mapping schema is an assignment of the set of inputs to some given 
reducers so that the following two constraints are satisfied: 

— A reducer is assigned inputs whose sum of the sizes is less than or equal to the reducer 
capacity q. 

— For each output, we must assign its corresponding inputs to at least one reducer in 
common. 


A mapping schema is optimal when the communication cost is minimum. The number of 
reducers we use often is minimal for an optimal mapping schema but this may not always 
be the case. It is desirable to minimize the number of reduce rs to o. We o ffer insight about 
communication cost and number of reducers uses in Examples |2.1| and |2.2[ 
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Tradeoffs. The following tradeoffs appear in MapReduce algorithms and in particular in our 
setting: 

— A tradeoff between the reducer capacity and the number of reducers. For example, large 
reducer capacity allows the use of a smaller number of reducers. 

— A tradeoff between the reducer capacity and the parallelism. For example, if we want to 
achieve a high degree of parallelism, we set low reducer capacity. 

— A tradeoff between the reducer capacity and the communication cost. For example, in the 
case reducer capacity is equal to the total size of the data then we can use one reducer and 
have minimum communication (of course, this goes at the expense of parallelization). 

In the subsequent subsections, we present the A2A mapping schema problem and the X2Y 
mapping schema problem with fitting examples and explain the tradeoffs. 

2.1. The A2A Mapping Schema Problem 

An instance of the A2A mapping schema problem consists of a list of m inputs whose input 
size list is VF = {wi,W 2 , ■ ■ ■ ,Wm} and a set of z identical reducers of capacity q. A solution 
to the A2A mapping schema problem assigns every pair of inputs to at least one reducer in 
common, without exceeding q at any reducer. 


wi — W 2 — W 3 — 0.20(3', W 4 — W 5 — 0.19(3', wq — wj — 0.18g 


Wx , W2 , UI 3 , 'U’4 

1 W3,W4,Wq,Wq\ 

wi, W 2 - m 3 , W 4 , w, 


W\, W2 , UIs , Wq 

1 W3,W4,Wj 1 

mi, m 2 , m 5 , WQ, WJ 

Wi , W 2 , Wj 

1 w^,we,W7 1 

m 3 , m 4 , mg, mg, wj 


The first way to assign inputs The second way to assign inputs 

(non-optimum communication cost) (optimum communication cost) 

Fig. 3: An example to the A2A mapping schema problem. 


Example 2.1. We are given a list of seven inputs / = {A, ^ 2 , • ■ ■, * 7 } whose size list is 
W = {0.20q,0.20q,0.20q,0.19q,0.19q,0.18q,0.18q} and reducers of capacity q. In Figure we 
show two different ways that we can assign the inputs to reducers. The best we can do to 
minimize the communication cost is to use three reducers. However, there is less parallelism 
at the reduce phase as compared to when we use six reducers. Observe that when we use 
six reducers, then all reducers have a lighter load, since each reducer may have capacity less 
than 0 . 89 . 

The communication cost for the second case (3 reducers) is approximately 3q, whereas for 
the first case (6 reducers) it is approximately 4.2q. Thus, in tradeoff, in the 3-reducers case 
we have low communication cost but also lower degree of parallelization, whereas in the 
6 -reducers case we have high parallelization at the expense of the communication cost. 


2.2. The X2Y Mapping Schema Problem 

An instance of the X2Y mapping schema problem consists of two disjoint lists X and Y and 
a set of identical reducers of capacity q. The inputs of the list X are of sizes wi,W 2 ,. ■ ■, Wm, 
and the inputs of the list Y are of sizes w[,W2, ■ ■ ■ .w!^. A solution to the X2Y mapping schema 
problem assigns every two inputs, the first from one list, X, and the second from the other 
list, Y, to at least one reducer in common, without exceeding q at any reducer. 

Example 2.2. We are given two lists, X of 12 inputs, and F of 4 inputs (see Figure]^ and 
reducers of capacity q. We show that we can assign each input of the list X with each input 
of the list Y in two ways. In order to minimize the communication cost, the best way is to use 
12 reducers. Note that we cannot obtain a solution for the given inputs using less than 12 
reducers. However, the use of 12 reducers results in less parallelism at the reduce phase as 
compared to when we use 16 reducers. 
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Inputs of list X 


Wi = W2 = 0.25(7,1^3 = W 4 = 0.24(7, = 0.23(7, 

Wj =Wq = 0.22q,iV9 = iVio = 0.21q, Wii = w-^2 — 0.20q 


w[ = W2 = 0.25q,iV3 = W 4 = 0.24(7 


Inputs of list Y 


\wi,W2,w\,W2 1 

|wi,W2,W3,W4 

W3, W4 ,Wt,W^ I 

|W3,W4, W3,W4 

\Wc,,Wf„w\,W2 1 

1 Wc^, Wft, W3, W4 

\wj,Wq,w\,W2 I 

|W7,W8,W3,W4 

Wq, Win. Wi, W 9 1 

1 Wq, Wm. W^, wI 


Wii, Wi2. Wi', W^l 

\^11> Wi2>w'-^, W4 


The first way to assign inputs 
using 12 reducers 


1 Wi, W2. W3, Wi' 

1 1 VVi/VV2. W3, W3 

1 W4, W5, W6, Wi' 

1 |W4, W5,W6,W3 

1 W7, Wg. Wg, wj 

|W7,W8, Wg,W3 1 

|Wio, Wii, Wi2> Wil |Wi0/ ^12> 

1 Wi, W2. W3, W7 

1 |Wi, W2.W3,W4 1 

1 IV4, W|5,W,;,Wj 

J |W4, 1 

W7, Wg. Wg, W2 

1 W7, W8,Wg,W4 1 

Wl0> ^11> ^12> klO/ ^12> 


The second way to assign inputs 
using 16 reducers 


Fig. 4: An example to the X2Y mapping schema problem. 


— In this paper, we assume we have made a decision on the degree of parallelization we want 
(by setting the reducer capacity q). 


3. INTRACTABILITY OF FINDING A MAPPING SCHEMA 

In this section, we will show that the A2A and the X2Y mapping schema problems do not 
possess a polynomial solution. In other words, we will show that the assignment of two 
required inputs to the minimum number of identical-capacity reducers to find solutions to 
the A2A and the X2Y mapping schema problems cannot be achieved in polynomial time. 

3.1. NP-hardness of the A2A Mapping Schema Problem 

A list of inputs I = , im} whose input size list is fF = {wi, W 2 ,..., Wm} and a set 

of identical reducers R = {ri, r 2 ,..., r^}, are an input instance to the A2A mapping schema 
problem. The A2A mapping schema problem is a decision problem that asks whether or not 
there exists a mapping schema for the given input instance such that every input, ix, is 
assigned with every other input, iy, to at least one reducer in common. An answer to the A2A 
mapping schema problem will be “yes,” if for each pair of inputs {{ix, iy)), there is at least one 
reducer that holds them. 

In this section, we prove that the A2A mapping schema problem is NP-hard in the case of 
z > 2 identical reducers. In addition, we prove that the A2A mapping schema problem has a 
polynomial solution to one and two reducers. 

If there is only one reducer, then the answer is “yes” if and only if the sum of the input sizes 
Sill most q. On the other hand, if g < SSi then the answer is “no.” In case of 

two reducers, if a single reducer is not able to accommodate all the given inputs, then there 
must be at least one input that is assigned to only one of the reducers, and hence, this input 
is not paired with all the other inputs. In that case, the answer is “no.” Therefore, we achieve 
a polynomial solution to the A2A mapping schema problem for one and two identical-capacity 
reducers. 

We now consider the case of 2 ; > 2 and prove that the A2A mapping schema problem for 
z > 2 reducers is at least as hard as the partition problem. 

Theorem 3.1. The problem of finding whether a mapping schema ofm inputs of different 
input sizes exists, where every two inputs are assigned to at least one of z>2> identical-capacity 
reducers, is NP-hard. 

The proof appears in Appendix]^ 
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3.2. NP-hardness of the X2Y Mapping Schema Problem 

Two lists of inputs, X = {ii, Z 2 , ■ • •, im} whose input size list is = {wi,'W 2 , ■ ■ ■, Wm} and 
Y = ... ,i'n} whose input size list is Wy = {w'i,W 2 , ■ ■ ■ and a set of identical 

reducers R = {ri, r 2 ,..., r^} are an input instance to the X2Y mapping schema problem. 
The X2Y mapping schema problem is a decision problem that asks whether or not there 
exists a mapping schema for the given input instance such that each input of the list X is 
assigned with each input of the list Y to at least one reducer in common. An answer to the 
X2Y mapping schema problem will be “yes,” if for each pair of inputs, the first from X and the 
second from Y, there is at least one reducer that has both those inputs. 

The X2Y mapping schema problem has a polynomial solution for the case of a single 
reducer. If there is only one reducer, then the answer is “yes” if and only if the sum of the 
input sizes YlT=i '^'i i® most q. On the other hand, if g < Er=i«^* + Er=i«^^>then 

the answer is “no.” Next, we will prove that the X2Y mapping schema problem is an NP-hard 
problem for z > 1 identical reducers. 

Theorem 3.2. The problem of finding whether a mapping schema of m and n inputs 
of different input sizes that belongs to list X and list Y, respectively, exists, where every 
two inputs, the first from X and the second from Y, are assigned to at least one of z > 2 
identical-capacity reducers, is NP-hard. 

The proof appears in Appendix]^ 

4. APPROXIMATION ALGORITHMS: PRELIMINARY RESULTS 

Since the A2A Mapping Schema Problem is NP-hard, we start looking at special cases and 
developing approximation algorithm to solve it. We propose several approximation algorithms 
for the A2A mapping schema problem that are based on bin-packing algorithms, selection of 
a prime number p, and division of inputs into two sets based on their sizes. 

Each algorithm takes the number of inputs, their sizes, and the reducer capacity (see 
Table [Till. The approximation algorithms have two cases depending on the sizes of the inputs, 
as follows: 

(1) Input sizes are upper bounded by 

(2) One input is of size, say Wi, greater than |, but less than q, and all the other inputs have 
size less than or equal to g — iCi. In this case most of the communication cost comes from 
having to pair the large input with every other input. 

Of course, if the two largest inputs are greater than the given reducer capacity q, then 
there is no solution to the A2A mapping schema problem because these two inputs cannot be 
assigned to a single reducer in common. 

Parameters for analysis. We analyze our approximation algorithms on the following 
parameters of the mapping schema created by those algorithms: 

(1) Number of reducers. This is the number of reducers used by the mapping schema to send 
all inputs to. 

(2) The communication cost, c. The communication cost is defined to be the sum of all the bits 
that are required, according to the mapping schema, to transfer from the map phase to the 
reduce phase. 

Table ^summarizes all the results in this paper. Before describing the algorithms, we look at 
lower bounds for the above parameters as they are expressed in terms of the reducer capacity 
q and sum of sizes of all inputs s. 

Theorem 4.1. (Lower bounds on the communication cost and number of 
REDUCERS) For a list of inputs and a given reducer capacity q, the communication cost and the 
number of reducers, for the A2A mapping schema problem, are at least y and y, respectively, 
where s is the sum of all the input sizes. 
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Cases 

Theorems 

Communication cost 

Approximation ratio 


The lower bounds for the A2A mapping schema problem 


Different-sized inputs 


4.1 


g2 

q 


Equal-sized inputs 


5.1 




The lower boun 

ds \ 

or the X2Y mapping schema problem 

Different-sized inputs 


10. 

1 

2 su.rn,x -surriy 

q 


Optimal algorithms for the A2A mapping schema problem {* equal-sized inputs) 

Algorithm for reducer capacity q — 2 


5.6 


m(m — 1) 

Optimal 

Algorithm for reducer capacity q — 3 


5.6 



optimal 

The At/ method: When g is a prime number 


5.6 



optimal 

Non-optimal algorithms for the / 

i2A 

mapping schema problem and their upper bounds 

Bin-packing-based algorithm, not including an 
input of size > | 


4.5 


isi 

q 

1 

4 

Algorithm 1 


6.3 


2 k 1 

l/k - 1 

Algorithm 2: The first extension of the AU 
method 


7.1 


qp(p + 1) + z' 

9/(9+1) 

Algorithm 3: The second extension of the AU 
method 


7.5 


X (<?(? +1))'-^ 

(9'-l)/9(9-l)(9+l)<-i 

Bin-packing-based algorithm considering an 
input of size > ^ 


9.1 


(m-l)-q+^ 

^2 


A non-optimal algorithm for the X2Y mapping schema problem and their upper bounds 


Bin-packing-based algorithm, q — 2b 


10.2 


i-SUTTlx, -surriy 

h 

1 

4 

Approximation ratio. The ratio between the optima 
algorithm. 

Notations: s: sum of all the input sizes. q\ the reduc 
list X. surriy: sum of input sizes of the list Y . p: the 

1 cor 

er ca 

near 

nmunication cost and the communication cost obtained from an 

pacity. m: the number of inputs, sum^^: sum of input sizes of the 
est prime number to q. Z > 2. fc > 1. 


Table I: The bounds for heuristics for the A2A and the X2Y mapping schema problems. 


Proof. Since an input i is replicated to at least reducers, the communication cost 

for the input iiswiX . Hence, the communication cost for all the inputs will be at least 

Sill Since s > q, we can conclude Thus, the communication cost is at least 

y-™ ^ 

A^i—1 q ~ q • 

Since the communication cost, the number of bits to be assigned to reducers, is at least ^, 
and a reducer can hold inputs whose sum of the sizes is at most q, the number of reducers 
must be at least □ 

qz >—■ 


4.1. Bin-packing-based Approximation 

Our general strategy for building approximation algorithms is as follows: we use a known 
bin-packing algorithm to place the given m inputs to bins of size lk>2. Assume that we 
need x bins to place m inputs. Now, each of these bins is considered as a single input of size | 
for our problem of finding an optimal mapping schema. Of course, the assumption is that all 
inputs are of size at most Ik >2. ^^ 

First-Fit Decreasing (FFD) and Best-Fit Decreasing (BFD) | |Coffman et al. 1997 1 are most 
notable bin-packing algorithms. FFD or BFD bin-packing algorithm ensures that all the bins 
(except only one bin) are at least half-full. T here also exists a pseudo polynomial bin-packing 
algorithm, suggested by Karger and Scott [Karger and Scott 2008[ , that can place the m 
inputs in as few bins as possible of certain size. 
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Algorithms 


Inputs 


Non-optimal algorithms for the A2A mapping schema problem 


Bin-packing-based algorithm 

Any number of inputs of any size 

Algorithm 1 

Any number of inputs of size at most k > 3 

Algorithm 2: The first extension of the AU method 

p'^+p-l + l,p + l — q,l >2 

Algorithm 3: The second extension of the AU method 

q\ 1 > 2 and g is a prime number 


A non-optimal algorithm for the X2Y mapping schema problem 


Bin-packing-based algorithm, > ^ 

Any number of inputs of any size 

Notations: Wi and Wj : the two largest size inputs of a 
X. w'f.: the largest input of a list Y. 

list, p: the nearest prime number to q. Wk’. the largest input of a list 


Table II: Reducer capacity and input constraints for different algorithms for the mapping 
schema problems. 


Example 4.2. Let us discuss in more detail the case fc = 2. In this case, since the reducer 
capacity is q, any two bins can be assigned to a single reducer. Hence, the approximation 
algorithm uses at most reducers, where x is the number of bin; see Figure for an 

example. 


'Wi = W 2 = W 3 = 0.20q, W 4 = W 3 = O.lOq, 
wg = wy = O.lSg 


'Wi,W2 W3,W4, 

Wq\ 

1 W7 

Four bins, each of size ^ 

Wi , W2 

W3, W4 


IHa, W4 

W3,Wq\ 





Wi , W2 

1H5, WQ 


IHa, 1H4 

Wj 


Wi, W2 

Wt 

|| W5,we 

' '^7 1 


Six reducers 

Fig. 5: Bin-packing-based approximation algorithm. 


For this strategy a lower bound on communication cost depends also on k as follows: 


Theorem 4.3 (Lower bound on the communication cost). Let q>lbe the reducer 
capacity, and let /c > 1, is the bin size. Let the sum of the given inputs is s. The 

communication cost, for the A2A mapping schema problem, is at least s [ \. 


Proof. A bin can hold inputs whose sum of the sizes is at most |. Since the total sum of 
the sizes is s, it is required to divide the inputs into at least x = ^ bins. Now, each bin can be 
considered as an identical sized input. 

Since a bin i is required to be sent to at least [fEyJ reducers (to be paired with all the other 
bins), the sum of the number of copies of (a:) bins sent to reducers is at least x [fEyJ. We need 
to multiply this by | (the size of each bin) to find the communication cost. Thus, we have at 
least 


a: — 1 
-k — 1- 


sk 

q 


sk 

_9 _ 

- k — 1 - 


q _ 
k ® 


sk 

_9 _ 

L k- 


fj 


communication cost. □ 
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This co mmu nication cost in the above theorem, as expected, is larger than the one in 
Theorem |4.1[ where no restriction in a specific strategy was taken into account. 


Example 4.4. Example for k — 2. Let us apply our strategy to the case where k = 2, i.e., 
we have the algorithm: (i) we do bin-packing to put the inputs in bins of size and (a) we 
provide a mapping schema for assigning each pair of bins t o at least one reducer. Such a 
schema is easy and has been discussed in the literature {e.g., |Ullman 2012|). 


FFD and BFD bin-packing algorithms provide an ^ • OPT approximation ratio | Johnson 
1973) , i.e., if any optimal bin-packing algorithm needs OPT bins to place (m) inputs in the bins 
of a given size |, then FFD and BFD bin-packing algorithms always use at most ^ • Opt bins 

of an identical size (to place the given m inputs). Since we require at most reducers for 

a solution to the A2A mapping schema problem, the algorithm requires at most • Opt)^/2 
reducers. 

Note that, here in this case. Opt does not indicate the optimal number of reducers to assign 
TO inputs that satisfy the A2A mapping schema problem] Opt indicates the optimal number 
of bins of size | that are required to place to inputs. 

The following theorem gives the upper bounds that this approximation algorithm achieves 
on the communication cost and the number of reducers. 


Theorem 4.5. (Upper bounds on communication cost and number of reducers 
FOR k = 2) The above algorithm using a bin size 6 = | where q is the reducer capacity achieves 
the following upper bounds: the number of reducers and the communication cost, for the A2A 
mapping schema problem, are at most and at most 4y, respectively, where s is the sum of 
all the input sizes. 

Proof. A bin i can hold inputs whose sum of the sizes is at most b. Since the total sum of 
the sizes is s, it is required to divide the inputs into at least | bins. Since the FFD or BFD 
bin-packing algorithm ensures that all the bins (except only one bin) are at least half-full, 
each bin of size | has at least inputs whose sum of the sizes is at least |. Thus, all the inputs 
can be placed in at most ^ bins of size |. Since each bin is considered as a single input, we 

can assign every two bins to a reducer, and hence, we require at most ^ reducers. Since each 
bin is replicated to at most 4- reducers, the communication cost is at most Yl,i<i<m WiX A- = 
□ 


5. EQUAL-SIZED INPUTS OPTIMAL ALGORITHMS 

As we explained, looking at inputs of same size makes sense because we imagine the inputs 
are bei ng b in-packed into bins of size |, for k > 2 (using bin-packing-based algorithm 
Section|4^, and that once this is done, we can treat the bins themselves as things of unit size 
to be sent to the reducers. Thus, in this section, we will shift the notation so that all inputs 

are of unit size, and g is some small integer, e.g., 3. _ 

In this section, we provide optimal algorithms for q — 2 (in Section |5.1| l and q = 3 (in 
Section Afrati and Ullman [Afrati and Ullman 2013| provided an optimal algorithm for 
the A2A mapping schema problem where q is a prime number and the nu mber of inputs 


is TO = q^. We extend this algorithm for to = q^ q + 1 inputs (in Section 5.31, and this 


extension also meets the lower bound on the communication cost. We will generalize these 
three algorithms in the Sections and 

In this setting, by minimizing the number of reducers, we minimize communication, since 
each reducer is more-or-less filled to capacity. So we define 

— r{m, q) to be the minimum number of reducers of capacity q that can solve the all-pairs 
problem for m inputs. 

The following theorem sets a lower bound on r{m, q) and the communication cost for this 
setting. 
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Theorem 5.1. (Lower bounds on the communication cost and number of 
REDUCERS) For a given reducer capacity 9 > 1 and a list ofm inputs, each input is of size one, 
the communication cost and the number of reducers (r{m,q)), for the A2A mapping schema 
problem, are at least m\j^^\ and at least respectively. 

Proof. Since an input i is required to be sent to at least reducers, the sum of the 

number of copies of (m) inputs sent to reducers is at least , which result in at least 

communication cost. 

There are at least number of copies of (m) inputs to be sent to reducers and a 

reducer can hold at most q inputs; hence, r(m, q) > [yj ■ □ 

5.1. Reducer Capacity g = 2 

Here, we offer a recursive algorithm and show that this algorithm does not only obtain 
the bound r{m,2) < but it does so in a way that divides the reducers into m — 1 

“teams” of ^ reducers, where each team has exactly one occurrence of each input. We will 
use these properties of the output of this algorithm to build an algorithm for g = 3 in the next 
subsection. 

The recursive algorithm. We are given a list A of m inputs. The intention is to have all 
pairs of inputs from list A partitioned into to — 1 teams with each team containing exactly 
Y pairs and each input appearing exactly once within a team. Hence, we will use 
reducers for assigning pairs of each input. 

We split A into two sublists Ai and A 2 of size ^ each. Suppose, we have the ^ — 1 teams 
for a list of size y. We will take the ^ — 1 teams of Ai, the ^ — 1 teams of A 2 and “mix them 
up” in a rather elaborate way to form the to — 1 teams for A: 

Let the teams for Ai and y42 be {gi, 52 , 53 , • ■ • , 5 ^} and {lii, /i 2 , ( 13 , ■ • ■, h™} respectively. We 
will form two kind of teams, teams of kind I and teams of kind II as follows: 

Teams of kind I. We will form y teams of kind I by taking one input from Ai and one 
input from A 2 . For example, the first team for A is {( 51 , (ii), ( 52 , ( 12 ), ( 53 , ^ 3 ), • ■ •, ( 5 ^, 
the second team for A is {( 51 , ( 12 ), ( 52 , hs), ( 53 , ( 14 ),..., ( 5 ^, hi)}, and so on. 

Teams of kind II. We will form the remaining ^ — 1 teams having ^ reducers in each. In 
teams of kind I each pair (reducer) contains only inputs from one of the lists Ai or A 2 . Now 
we produce pairs, with each pair having both inputs from Ai or A 2 . In order to do that, we 
divide recursively divide Ai into two sublists and perform the operation what we performed 
in the team of kind 1. The same procedure is recursively implemented on A 2 . 

Example 5.2. For m — 8, we form 7 teams. First we form teams of kind 1. We divide 8 
inputs into two lists Ai and A 2 . After that, we take one input from Ai and one input from A 2 , 
and create 4 teams, see Figure Now, we recursively follow the same rule on each sublist, 
Ai and A 2 , and create 3 remaining teams of kind II, see Figurej^ 


1,5 

1,6 

1,7 

1,8 

1,3 

1,4 

1,2 

2,6 

2,7 

2,8 

2,5 

2,4 

2,3 

3,4 

3,7 

3,8 

3,5 

3,6 

5,7 

5,8 

5,6 

4,8 

4,5 

4,6 

4,7 

6,8 

6,7 

7,8 

Team 1 

Team 2 

Team 3 

Team 4 

Team 5 

Team 6 

Team 7 


Teams of kind I 


Teams of kind II 



Fig. 6: 

The teams for to 

= 8 and 5 = 

2. 



Actually in Figure [7l the teams for this example are shown in non-bold face fonts (two in 
each triplet in Figure^ notice that they are from 1-8) in teams 1 through 7 in Figure]^ 
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The following theorem is easy to prove. 

Theorem 5.3. In each team an input appears only once . In each team all inputs appear . 
There are m — 1 teams which is the minimum possible . Hence this is an optimal mapping 
scheme that assigns inputs to reducers . 

This works if the number of inputs is a power of two. We can use known techniques to make 
it work with good approximation in general. 


5.2. Reducer Capacity g = 3 

Here, we present an algorithm that constructs an optimal mapping schema for g = 3. Our 
recursive algorithm starts by taking the mapping schema constructed in previous subsection 
for q = 2. We showed there that for q = 2,we can not only obtain the bound r(m, 2) < 
but that we can do so in a way that divides the reducers into m — 1 teams of ^ reducers in 
each team, where each team has exactly one occurrence of each input. 

Now, we split m inputs into two disjoint sets: set A and set B. Suppose m = 2n — 1. Set A 
has n inputs and set B has n — 1 inputs. We start with the n inputs in set A, and create n — 1 
teams of ? redu cers, each reducer getting two of the n inputs in A, by following the algorithm 
given in Section 5.1 Next, we add to all reducers in one team another input from set B. I . e ., 
in a certain team we add to all ^ reducers of this team a certain input from set B, and thus, 
we form a triplet for each reducer. 

Since there are n — 1 teams, we can handle another n — 1 inputs. This is the start of a 
solution for q = 3 and to = 2n — 1 inputs. To complete the solution, we add the reducers for 
solving the problem for the n — 1 inputs of the set B. That leads to the following recurrence 


ti(ti — 1) 

r{m, 3) = - -—- + r(n — 1, 3), where to = 2n — 1 

r(3,3) = l 

We solve the recurrence for to a power of 2, and it exactly matches the lower bound of 
r{m,3) = Moreover, notice that we can prove that this case is optimal either by 

proving that r{m, 3) = to(to — 1)/6 (as we did above) or by observing that every pair of inputs 
meets exactly in one reducer. This is easy to prove. Hence the following theorem: 

Theorem 5.4. This algorithm constructs an optimal mapping schema for the reducer 
capacity 3. 


I = {1,2..... 15} 

= { 1 , 2 , . . . . 8 } 

B = {9, 10.. . . , 15} 


1,5,9 1 

1 1.6.10 

2,6,9 1 

I 2,7,10 1 

3,7.9 1 

1 3.8.10 1 

4,8,9 1 

1 4,5,10 1 


1.7.11 


2.8.11 




4,6,11 



1,8,12 1 


2, 5,12 1 


3.6.12 1 


4,7,12 1 


Team 1 


Team 2 


Team 3 


Team 4 


7i = {9,10,... , 15} 

Al ^ {9,10, 11, 12} 
Bi ^ {13, 14, 15} 


i 1,3,13 1 

1 1,4,14 [ 


1,2,15 

1 2,4,13 1 

1 2,3,14 [ 

3,4,15 

15.7,13 1 

1 5.8,14 1 


5,6,15 

1 6.8.13 1 

I 6. 7.14^ 

7,8,15 


Team 5 


Team 6 


|9, 11,13 1 

|9, 12,14 1 

|9, 10,15 1 

10,12, i:^ 

|10, 11,141 

|11, 12,15| 


[L3, 14, 15| 

An additional reducer 


Team 8 Team 9 Team 10 

Fig. 7: An example of a mapping schema for q = 3 and to = 15. 


Example 5.5. An example is shown in Figure We explained how this figure is 
constructed for g = 2 (the non-bold entries). Now we use the algorithm just presented here to 
construct the 35 (= 15 x H) reducers. We explain below in detail how we construct these 35 
reducers. 
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We are given 15 inputs (/ = {1,2,..., 15}). We create two sets, namely A of y = 8 inputs and 
B o{X = 7 inputs, and arrange {y—l)x [|] =28 reducers in the form of 7 teams of 4 reducers 
in each team. These 7 teams assign each input of the set A with all other inputs of the set A 
and all the inputs in the set B as follows. We pair every two i nput s of the set A and assign 
them to exactly one of 28 reducers as we explained in Section |5.1| Once every pair of y = 8 


inputs of the set A is assigned to exactly one of 28 reducers, then we assign the input of 
the set B to all the four reducers of (i — 8)*^ team. Thus, e.g., input 10 is assigned to the four 
reducers of Team 2. 

Now these 28 reducers have seen that each pair of inputs from set A meet in at least one 
reducer and each pair of inputs, one from A and one from B meet in at least one reducer. Thus, 
it remains to build more reducers so that each pair of inputs (both) from set B meet. According 
to the recursion we explained, we break set B into sets Ai and Bi, of size 4 and 3 respectively, 
and we apply our method again. In particular, we create two sets, Ai = (9,10,11,12} of yi = 4 
inputs and Bi = {13,14,15} of Xi = 3. Then, we arrange (yi — 1) x [= 6 reducers in the 
form of 3 teams of 2 reducers in each team. We assign each pair of inputs of the set Ai to these 
6 reducers, and then input of the set B^ to all the two reducers of a team, see Team 8 to 
Team 10. 

The last team is constructed so that all inputs in Bi meet at the same reducers (since Bi 
has only 3 elements and 3 is the size of a reducer, one reducer suffices for this to happen). 


Open problem. Now the interesting observation is that if we can argue that the resulting 
reducers can be divided into teams of ™ reducers each (with each team having one 
occurrence of each input), then we can extend the idea to q = 4, and perhaps higher. 


5.3. When 5 or g - 1 is a prime number 

An algorithm to provide a mapping schema for the reducer capa city g, where q is a prime 
number, and m = inputs is suggested by Afrati and Ullman in l Afrati and Ullman 20131 . 
This method meets the lower bounds on the communication cost. We call this algorithm 
the AU method. For the sake o f completeness, we provi de an overview of the AU method. 
Interested readers may refer to [Afrati and Ullman 2013|. 


The AU method. We divide the m inputs into q^ equal-sized subsets (each with ^ inputs) 
that are arranged ina Q — qx q square. The subsets in row i and column j are represented 
by Sij, where 0 <i < q and 0 < j < <z. 

We now organize q{q -t 1) reducers in the form of g + 1 teams of q players (or reducers) in 
each team. Note that sum of sizes of the inputs in each row and column of the Q square is 
exactly q. 

The teams are arranged from 0 to q, and the reducers are arranged from 0 to g — 1. We 
first arrange inputs to the team q. Since the sum of the sizes in each column of the P square 
is q, we place one column of the P square to one reducer of the team q. Now we place the 
inputs to the remaining teams. We use modulo operation for the assignment of each subset 
to each team. The subset is assigned to a reducer r of each team t, 0 < t < q, such that 
{i + tj)modulo q — r. An example for g = 3 and m = 9 is given in Figure]^ 


Team 0 Team 1 Team 2 Team 3 



^0,0 

^0,1 

^0,2 

Si.o 

Si.i 

^1,2 

^2,0 

^2,1 

^2,2 


Fig. 8: The AU method for the reducer capacity p = 2> and m = 9. 
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Total required reducers. The AU method uses q{q + 1) reducers, which are organized in the 
form of g + 1 teams of q reducers in each team, and the communication cost is q‘^{q + 1). 


A simple extension of the AU method. Now, we can extend the AU method as follows: 
we can add g + 1 additional inputs, add one to each reducer and add one more reducer that 
has the g + 1 new inputs. That gives us reducers of size g = g + 1 and m = g^ + g + 1, or 
r(g^ + g + 1, g + 1) = g(g + 1) + 1 = g^ + g + 1. If you substitute m = g^ + g + 1 and p = p + 1, 
you can check that this also meets the bound of r — . In Figure ^ we show a mapping 

schema for this extension to the AU method for g = 4 and m = 14. 




An extra reducer 


Fig. 9: An optimum mapping schema for g = 4 and m = 14 by extending the AU method. 


In conclusion, in this section we have shown the following: 

Theorem 5.6. We can construct optimal mapping schemas for the following cases: 

il ) q = 2. 

( 2 ) g = 3. 

(5) g being a prime number and m = g^. 

(4) g — 1 being a prime number and m = (g — 1)^ + g, where q is the reducer capacity and m is 
the number of inputs. 

Open problem: Can we generalize the last idea to get optimal schemas for more cases? 

Approximation Algorithms for the A2A Mapping Schemas Problem. We can use the 

optimal mapping schemas of Sectionj^to construct good approximation of mappings schemas 
in many cases. The general techniques, we will use in this section move along the following 
dimensions/ideas: 

— Assuming that there are no inputs of size greater than |, construct bins of size |, and 
treat each of the bins as a single input of size 1 and assume the reducer capacity is k. Then 
apply one of the optimal techniques of Section to construct a mapping schema. These 
algorithms are presented in Sections]^ and 

— Getting inspiration from the methods developed (or only presented - in the case of the AU 
method) in Section |5.3| we extend the ideas to cons tru ct go od approximation algorithms 
for inputs that are all of equal size (see Sections 1 7.1 1 and [7^ . 

Thus, in Sections and we will give several such techniques and show that some of 
them construct mapping schemas close to the opti mal. T o that end, we have already shown a 
schema based on bin-packing algorithms in Section [4!T| 

6. GENERALIZING THE TECHNIQUE FOR THE REDUCER CAPACITY Q > 3 AND INPUTS OF SIZE < 

Q/K, K >3 

In this section, we will generalize the algorithm for g = 3 given in Sectionand present an 
algorithm (Algorithm 0 for inputs of size less than or equal to | and k > 3. For simplicity, we 
assume that k divides g evenly throughout this section. 
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6.1. Algorithm [T|A 

We divide Algorithm into two parts based on the value of k as even or odd. Algorithm 
considers that k is an odd number. Pseudocode of Algorithm is given in Appendixp} 
Algorithmic works as follows: 

First places all the given inputs, say m', to some bins, say m, each of size fc > 3 is an odd 
number. Thus, a reducer can hold an odd number of bins. After placing all the m' inputs to m 
bins, we can treat each of the m bins as a single input of size one and the reducer capacity to 
be k. Now, it is easy to turn the problem to a case similar to the case of g = 3. Hence, we d ivide 
the m bins into two sets A and B, and follow a similar approach as given in Section [5.2[ 

Aside. Equivalently, we can consider q to be odd and the inputs to be of unit size. In what 
follows, we will continue to use q, which is an odd number, as the reducer capacity and assume 
all inputs (that are actually bins containing inputs) are of unit size. 

Example 6.1. If g = 30 and k = 5, then we can pack given inputs to some bins of size 6. 
Hence, a reducer can hold 5 bins. Equivalently, we may consider each of the bins as a single 
input of size 1 and g = 5. 

For understanding of Algorithm [^, an example for g = 5 is presented in Figure [T^ where 
we obtain m = 23 bins (that are considered as 23 unit-sized inputs) after implementing a 
bin-packing algorithm to given inputs. 


7 = {1,2,, 23} 

A[] = {1.2.....16} 
_B[] = {17,18,... ,23} 


1 , 2 


3,4 5,6 


7,8 


9, 10 11,12 13,14 15,16 


1,2 9,10 17 


3,4 


11, 12 17 


5,6 13,14 17 


7.8 15. 16 17 


Team 5 


1,2 I 11,12 18 


3,4 I 13,14 18 


5,6 I 15,16 18 


7,8 II 9,10 I 18 



Team 3 


1.2 3,4 23 


5,6 


7,8 23 


9,10 11,12 23 


13,14 15,16 23 



Team 4 


17,18 19. 20 21 


17.18 19,20 22 


17.18 19,20 23 


21, 22. 23 


Team 6 


Team 7 


Additional reducers for the set B 


Fig. 10: Algorithmic - an example of a mapping schema for g = 5 and 23 bins. 


Algorithmic consists of six steps as follows: 

(1) Implement a bin-packing algorithm: Implement a bin-packing algorithm to place all the 
given m' inputs to bins of size where fc > 3 is an odd number and the size of all the 
inputs is less than or equal to |. Let m bins are obtained, and now each of the bins is 
considered as a single input. 

(2) Division of bins (or inputs) to two sets, A and B: Divide m inputs into two sets A and B of 
size y = [|J (+ 1) and x = m — y, respectively. 

(3) Grouping of inputs of the set A: Group the y inputs into ^ - \q-\q/2{\ disjoint groups, 

where each group holds inputs. (We consider each of the “ \q-\q/2{\ ) disjoint 
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groups as a single input that we call the derived input. By making u disjoint group^Cor 
derived inputs) of y inputs of the set A, we turn the case of any odd value of g to a case 
where a reducer can hold only three inputs, the first two inputs are pairs of the derived 
inputs and the third input is from the set B.) 

(4) Assigning groups (inputs of the set A) to some reducers: Organize (u — 1) x reducers in 
the form of u — 1 teams of [|] reducers in each team. Assign every two groups to one of 
(u — 1) X [|] reducers. To do so, we will prove the following Lemma 

Lemma 6.2. Let q be the reducer capacity. Let the size of an input is Each pair 

of 11 — 2^, z ^ 0, inputs can be assigned to 2^ — 1 teams of 2* ^ reducers in each 

(5) Assigning inputs of the set B) to the reducers: Once every pair of the derived inputs are 
assigned, then assign input of the set B to all the reducers of team. 

(6) Use previous steps on the inputs of the set B: Apply (the above mentioned) steps 1-4 on the 
set B until there is a solution to the A2A mapping schema problem for the x inputs. 


6.2 


Theorem 6.3 (The communication cost obtained using Algorithm [T|). For 
a given reducer capacity q > 1, k > 3, and a list of m inputs whose sum of sizes is s, the 
communication cost, for the A2A mapping schema problem, is at most ^ [ q{k-i) ~\ ([ q{k-i) ~\ ~ 1 )- 


Prooe. Since the FFD or BFD bin-packing algorithm ensures that all the bins (except 
only one bin) are at least half-full, each bin of size | has at least inputs whose sum of the 
sizes is at least Thus, all the inputs can be placed in at most x = s/(q/{k/2)) = || bins of 
size |. Now, each bin can be considered as an identical sized input. 

According to the construction given in Algorithm [^, there are at most g = \ groups 

(derived inputs) of the given x bins. In order to assign each pair of the derived inputs, each 
derived input is required to assign to at most g—1 reducers. In addition, the size of each input 
(bin) is |, therefore we have at most 


I X 5 ( 5 - l)/2 



■ 2x 
k-l 


2x ■ 
k-l 



communication cost. □ 


2k 


X 


sk 

( 

■ sk/q ■ 

q{k - 1) 

[ 

k-l 




Algorithm correctness. The algorithm correctness appears in Appendix |B| 


4.31 is 


( sk _ 

— 2 , 


6.3 1 is 


Approximation factor. The optimal communication cost (from Theorem 
l)/fc — IJ « Y • and the communication cost of the algorithm (from Theorem 
^ [ q(k-i) ] ( r qik-i) ] ~ 1) ~ s'^k/qfk — 1)^. Thus, the ratio between the optimal communication 
and the communication of our mapping schema is approximately 


^We suppose that u is a power of 2. In case u is not a power of 2 and u > q, we add dummy inputs each of size 
so that u becomes a power of 2. Consider that we require d dummy inputs. If groups of inputs of the set B each of 
size [ ] are less than equal to d dummy inputs, then we use inputs of the set B in place of dummy inputs, and 

the set B will be empty. 

®The proof appears in Appendix [ a] 
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6.2. Algorithm [T^ 

For the sake of completeness, we include the pseudocode of the algorithm for handling 
the case when k is an even number. We call it Algorithm and pseudocode is given in 
Appendix In this algorithm, we are given m' inputs of size less than or equal to | and 
A: > 4 is an even number. 

Similar to Algorithm [^, Algorithm first places all the m' inputs to m bins, each of size 
I, fc > 2 is an even nurnber. Thus, a r^ucer can hold an even number of bins. After placing 
all the m! inputs to m bins, we can treat each of the m bins as a single input of size one and 
the reducer capacity to be k. Now, we easily turn this problem to a case similar to the case 
of q = 2. Hence, we divide the m bins into two set A and B, and follow a similar approach as 


given in Section 5.1 


Example 6.4. If g = 30 and k = Q, then we can pack given inputs to some bins of size 5. 
Hence, a reducer can hold 6 bins. Equivalently, we may consider each of the bins as a single 
input of size 1 and q = 6. 


Note. Algorithms [T]\ and[l^ are based on a fact that how do we pack inputs in a well manner 
to bins of even or odd size. To understand this point, consider g = 30 and m' — 46. For 
simplicity, we assume that all the inputs are of size three. Now, consider k = 5,so we will use 
23 bins each of size 6 and apply Algorithm [^. On the other, consider fc = 6, so we will use 46 
bins each of size 5 and apply Algorithm [^. 


7. GENERALIZING THE AU METHOD 

In this section, we extend the AU method (Section 5.3 1 to handle more than g^ inputs, when 
g is a prime number. Algorithms 3 and 4. Recall that the AU method can assign each pair 
of g^ inputs to reducers of capacity g. We provide two extensions: (i) take m = p'^ + p ■ I + I 
identical-sized inputs and assign thes e inp uts to reducers of capacity p + l — q, where p is the 
nearest prime number to g, in Section [ thI and (ii) take m = q^ inputs, where I > 2, and assign 
inputs to reducers of capacity g, in Section [7^ 


7.1. When we consider the nearest prime to g 

We provide an extension to the AU method that handles m = p^ +p -l + l identical-sized inputs 
and assigns them to reducers of capacity p + l = q, where p is the nearest prime number to g. 
We call it the first extension to the AU method (Algorithm 2). 

Algorithm 2: The First Extension of the AU method. We extend the AU method by 
increasing the reducer capacity and the number of inputs. Consider that the AU method 
assigns p^ identical-sized inputs to reducers of capacity p, where p is a prime number. We add 
l{p+ 1) inputs and increase the reducer capacity to p -I- ^ (= g). 

In other words, m identical-sized inputs and the reducer capacity g are given. We select a 
prime number, say p, that is near most to g such that p + I — q and p"^ + l{p + 1) < m. Also, 
we divide the m inputs into two disjoint sets A and B, where A holds at most p^ inputs and 
B holds at most l(p+ 1) inputs. 

Algorithm 2 consists of six steps, where m inputs and the reducer capacity g are inputs to 
Algorithm 2, as follows: 

(1) Divide the given m inputs into two disjoint sets A of y = p^ inputs and B of x = m — y 
inputs, where p is the nearest prime number to g such that p + l = q and p"^ + l{p + 1) < m. 

(2) Perform the AU method on the inputs of the set A by placing y inputs to p -|-1 teams of p 
bins in each team, where the size of each bin is p. 

(3) Organize p(p-l-1) reducers in the form of p+l teams of p reducers in each teams, and assign 

jth Qf jt/i of bins to reducer of team of reducers. 

(4) Group the x inputs of the set B into u = \-^~\ disjoint groups. 

(5) Assign group to all the reducers of team. 
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(6) Use Algorithmic or Algorithmic to make each pair of inputs of the set B, depending on 
the case of the value of q, which is either an odd or an even number, respectively. 

Note that when we perform the above mentioned step 3, we assign each pair of inputs of 
the set A to p{p + 1) reducers, and such an assignment uses p capacity of each reducer. Now, 
each of p{p+l) reducers has q—p remaining capacity that is used to assign z*'* group of inputs 
of the set B. In this manner, all the inputs of the set A are assigned with all the m inputs. 


Algorithm correctness. The algorithm correctness appears in Appendix |D| 


Theorem 7.1 (The communication cost obtained using Algorithm 2). 

Algorithm 2 requires at most p{p + 1) + z reducers, where z = , and results in 

at most qp{p + 1) + z' communication cost, where z' = q jg ^/jg reducer capacity, and 

p is the nearest prime number to q. 


When I = q — p equals to one, we have provided an extension of the AU method in Section |5(3 
and in this case, we have an optimum mapping schema for q and m = q^ + q + 1 inputs. 


Prooe. In case of 1 > 1, a single reducer cannot be used to assign all the inputs of the set 
B. Since Algorithm 2 is based on the AU method. Algorithmic, and Algorithm|C> we always 

use at most p{p + 1) + z reducers, where z (= ) reducers are used to assign each pair 

of inputs of the set B based on Algorit hms IC or|C (for t he value of z, the reader may refer 
to Theorem 11 of the technical report [Afrati et al. 201^ ). Thus, the communication cost is 

at most qp{p + 1) + z', where z' {— ig the maximum communication cost required by 

Algorithmic or|C for assigning {p+ 1)1 inputs of the set B. □ 


Approximation factor. The optimal communication cost using the AU method is q^{q + 1). 
Thus, the difference between the communication of our mapping schema {q^{q + 1) + z', when 
assuming p is equal to q) and the optimal communication is z'. We can see two cases, as 
follows: 

(1) When q is large. Consider that q is greater than square or cube of the maximum difference 
between any two prime numbers. In this case, z' will be very small, and we will get almost 
optimal ratio. 

(2) When q is very small. In this case, then z' plays a role as follows: here, the number of 
inputs in the set B will be at most {p+l)l < q"^. Thus, the ratio becomes q/{q+l). 


7.2. For input size m = where g is a prime number 

We also provide another extension to the AU method that handles m — q^ identical-sized 
inputs and assigns them to reducers of capacity q, where g is a prime number and I > 2. We 
call it the second extension to the AU method (Algorithm 3). 

Algorithm 3: The Second Extension of the AU method. The second extension to the AU 
method (Algorithm 3) handles a case when m = q^, where I > 2 and g is a prime number. 
We present Algorithm 3 for m = q\ I > 2, inputs and the reducer capacity q, where g is a 
prime number. Nevertheless, m inputs that are less than but close to g* can also be handled 
by Algorithm 3 by adding dummy inputs such that m = g*, ( > 2. 

Algorithm 3 consists of two phases, as follows: 

The first phase: creation of a bottom up tree. Here, we present a simple example for the 
bottom-up tree’s creation for g = 3 and m = 3"^; see Figure |Tl| 

Example 7.2 {Bottom-up tree creation). A bottom-up tree for m = g' = S'* identical-sized 
inputs and g = 3 is given in Figure |^ Here, we explain how we constructed it. 
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Level 1 


Level 2 


Level 3 


Fig. 11: The second extension of the AU method (Algorithm 3): Phase 1 - Creation of the 
bottom-up tree. 


The height of the bottom up tree is ( — 1, and the last (/ — 1 )*^ level has m inputs in the 
form of ^ matrices of size g x g. Note that we have ^ columns at the last level, which holds 
m inputs; and these — columns are called the input columns. We create the tree in bottom-up 
fashion, where (/ — 2 )‘^ level has p matrices, whose each cell value refers to a input column 
of {I — 1)*^ level. We use a notation to refer a column of level by c*, where j is column index. 
Note that each column, c], at level i holds q columns of (* + 1)*^ 

level. In general, there are matrices at level i, whose each cell value, c®, refers to a 

column, of {i + 1 )‘^ level. 

Following that the bottom-up tree for m = 3^ identical-sized inputs and <7 = 3 has height 3. 
The last level {{I — 1 )*^ = 3’’'^) has 81 inputs in the form of ^ = 9 matrices of size 3x3. Note 

that we have ^ = 24 columns at 3’’®^ level; called the input columns. The I — 2 = 2 ®®^ level has 
^ — 3 matrices, whose each column, Cj, refers to <7 = 3 columns • ■ • Cj^) of 

3®"®^ level. Further, the root node is at level 1, whose each column, cj, refers to <7 = 3 columns 

(4_i),+i, c^,_i),+2: • ■ • Sy) 2 -" level- 

The second phase: creation of an assignment tree. The assignment tree is created in 
top-down fashion. Our objective is to assign each pair of inputs to a reducer, where inputs 
are arranged in the input columns of the bottom-up tree. If we can assign each pair of input 
columns (of the bottom-up tree) in the form of {q x g)-sized matrices, then the implementation 
of the AU method on each such matrices results in an assignment of every pair of inputs to 
reducers. Hence, we try to make pairs of all the input columns, by creating a tree called the 
assignment tree. 

Here, we present a simple assignment tree for m = 3‘^ and q = 3 (see Figure [T^. 

Example 7.3 {Assignment tree creation). The root node of the bottom-up tree becomes the 
root node the assignment tree. Recall that the root node of the bottom-up tree is a q x q 
matrix. First, consider the root node to understand the working of the AU method to create 
the assignment tree. Consider that each cell value of the root node matrix is of size one, and 
we have (<7 +1) teams of q bins (of size q) in each team. Our objective to use the AU method on 
the root node matrix is to assign each pair of cell values ((c^, cj)) in q{q -t 1) bins that results 
in an assignment of every pair of cell values (cj, cj) at a bin. 

Now, we create matrices by using these bins (the bins created by the AU method's 
implementation on the root node) that are holding the indices of columns of the second level 
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Fig. 12: The second extension of the AU method (Algorithm 3): Phase 2 — Creation of the 
assignment tree. 


(c^) of the bottom-up tree. We take each bin and its q indices c^, ... c|+q- We replace each 

Cj with q columns as: that results in q{q -t 1) matrices of size q x q, 

and these q{q -I-1) matrices become child nodes of the root node. Now, we consider each such 
matrix separately and perform a similar operation as we did for the root node. 

In this manner, the AU method creates {q{q + 1))*^^ child nodes (that are matrices of size 
<7 X g) at level of the assignment tree, and they create {q{q + 1))* child nodes (matrices of 
size g X g) at (z -F 1)*^ level of the assignment tree. 

Recall that there are ^ input columns at {I — 1)*^ level of the bottom-up tree that hold the 
original m inputs. The implementation of the AU method on each node (g x g-sized) matrix of 
(Z — 2)*^ level of the assignment tree assigns each pair of input columns at (Z — 1)*^ level of the 
assignment tree. Further the AU method’s implementation on each matrix of (Z — 1)*^ level 
assigns every pairs of the original inputs to g* x (g -F 1)^^^ reducers at Z‘^ level, which have 
reducers in the form of (g(g -F 1))*“^ teams of g reducers in each team. 

For m — identical-sized inputs and g = 3, we take the root node of the bottom-up tree 
(Figure [TT|) that becomes the root node of the assignment tree. We implement the ALZ^ method 
on the root node and assign each pair of cell values (c^, 1 < j < 9) to a bin of size g. Each cell 
value of the bins (cp is then placed by g = 3 columns ■ 0 that results in 

an assignment of each pair of columns of the second level of the bottom-up tree. For clarity, 
we are not showing bins. For the next 3’’'^ level, we again implement the AU method on all 
12 matrices at 2”"^ level and get 144 matrices at the third level. The matrices at 3’’^^ level 
are pairs of each input columns (of the bottom-up tree). The AU method’s implementation on 
each matrix of 3'''^ level assigns each pair of original inputs to reducers. For clarity, we are 
only showing all the matrixes and teams at levels 3 and 4, respectively. 


The assignment tree uses the root node of the bottom up tree, and we implement the AU 
method on the root node that results in g(g -F 1) child nodes at level two. Each child node is a 
g X g matrix, and the columns of all the g(g + 1) matrices provide all-pairs of the cell values 
of the root node matrix. At level i, the assignment tree has (g(g + 1))*“^ nodes, see Figure 13 
The height of the assignment tree is Z, where (Z — 1)*^ level has all-pairs of input columns anc 
[th igygi g solution to the A2A mapping schema problem for m inputs. 
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Matrix of order kx k I 1 I Level I 



Fig. 13: An assignment tree created using Algorithm 3. 


Algorithm correctness. Algorithm 3 satisfies the following Lemma [7.dl 

Lemma 7.4. The height of the assignment tree is I, and level of the assignment tree 
assigns each pairs of inputs to reducers. 


Theorem 7.5 (The communication cost obtained using Algorithm 3). 

Algorithm 3 requires at most q x {q{q + 1))*“^ reducers and results in at most q^ x {q{q + 1))^“^ 
communication cost, where q is the reducer capacity and I > 2. 


Prooe. For a given m = q, I > 2, the assignment tree has height I (Lemma 7.4 1 , and 
(according to Algorithm 3) level has q x {q{q + 1))*“^ reducers providing an assignment of 
each pairs of inputs. Hence, Algorithm 3 uses q{q{q + reducers, and the communication 
cost is at most X (q(g-t 1))*“L n 


5.11. Replacing m 


Approximation factor. The optimal communication is (see Theorem i 

with g* we get q\q^ — l)/(g — 1). Thus, the ratio between the optimal communication and the 
communication of our mapping schema is (g' — l)/g(g — l)(g + We can see two cases: 


(1) When g is large. Then we drop the constant 1 and the ratio is approximately equal to -. 

(2) When g is very small compared to qK Then the ratio is q'‘/q{q — l)(g + 1)*“L 

For g = 5, the inverse of the ratio is approximately (6/5)*“^. This is already acceptable for 
practical applications if we think that the size of data is 5^ thus I may as well he I = 9, in 
which case this ratio is approximately 4.3. For g = 2 and g = 3 we already have optimal 
mappings schemas. Our conjecture is that there are optimal schemas for g = 4 and g = 5 
even by using the techniques developed and presented here. 


Open problem: In this section, we provided two algorithms for two different cases extending 
the AU method. However, this is an open problem of finding good approximation algorithms 
for the subcases that are not covered here. 


8. A HYBRID ALGORITHM FOR THE A2A MAPPING SCHEMA PROBLEM 

In the previous sections, we provide algorithms for different-sized and almost equal-sized 
inputs. The hybrid approach considers both different-sized and almost equal-sized inputs 
together. The objective of the hybrid approach is to place inputs to two different-sized bins, 
and then consider each of the bins as a single input. 

Specifically, the hybrid approach uses the previously given algorithms (bin-packing-based 
approximation algorithm) and Algorithms [^,[^, 2,3. We divide the given m inputs into two 
disjoint sets according to their input size, and then use the bin-packing-based approximation 
algorithm and Algorithms [^,[^, 2, or 3 depending on the size of inputs. 

Algorithm 4. We divide m inputs into two sets A that holds the input i of size § < < |, 

and B holds all the inputs of sizes less than or equal to |. Algorithm 4 consists of four steps, 
as follows: 
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An assignment of each pair of the small inputs 


Fig. 14: An example to show the working of Algorithm 4. We are given 15 inputs, where inputs 
ii to ii are of sizes greater than and all the other inputs are of sizes less than or equal to 


( 1 ) 


( 2 ) 

(3) 

(4) 


Use the bin-packing-based approximation algorithm to place all the inputs of: 
(a) 


(b) 


the set A to bins of size and each such bin is considered as a single input of size | 
that we call the big input. Consider that x big inputs are obtained, 
the set B twice, first to bins of size |, where each bin is considered as a single input of 
size I that we call the medium input, and second, to bins of size |, where each bin is 
also considered as a single input of size | that we call the small input. Consider that y 
medium and z small inputs are obtained. 

Use reducers to assign each pair of big inputs. 

Use X X y reducers to assign each big input with each medium input. 

Use the AU method. Algorithm 2, or 3 on the z small inputs, depending on the case, to 
assign each pair of small inputs. 


We present an example to illustrate Algorithm 4 in Figure [l^ Note that the use of 
reducers assigns each pair of original inputs whose size between | and |. Also by using x x y 
reducers, we assign each big input (or original inputs whose size is between | and |) with 
each original input whose size is less than |. Further, the AU method. Algorithm 2, or 3 
assigns each pair of original inputs whose size is less than or equal to |. 


Algorithm correctness. The algorithm correctness shows that every pair of inputs is assigned 
to reducers. Specifically, the algorithm correctness shows that each pair of the big inputs is 
assigned to reducers, each of the big inputs is assigned to reducers with each of the medium 
inputs, and each pair of the small inputs is assigned to reducers. 


9. APPROXIMATION ALGORITHMS FOR THE A2A MAPPING SCHEMA PROBLEM WITH AN INPUT 

>QI2 

In this section, we consider the case of an input of size Wi, ^ < Wi < q; we call such an input 
as a big input. Note that if there are two big inputs, then they cannot be assigned to a single 
reducer, and hence, there is no solution to the A2A mapping schema problem. We assume m 
inputs of different sizes are given. There is a big input and all the remaining m — 1 inputs, 
which we call the small inputs, have at most size q — Wi. We consider the following three cases 
in this section: 

(1) The big input has size w^, where ^ < Wi < '^, 

(2) The big input has size Wi, where ^ < ^, 

(3) The big input has size w^, where ^ <w^ < q. 
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Fig. 15: An example to show an assignment of a big input of size ^ < Wi < ^ with all the 
remaining inputs of sizes less than or equal to |. 


The communication cost is dominated by the big input. We consider three different cases 
of the big input to provide efficient algorithms in terms of the communication cost, where 
the first two cases can assign inputs to almost an optimal number of reducers, which 
results in almost minimum communication cost. We use the previously given algorithms 
(bin-packing-based approximation algorithm) and Algorithms 1-4 to provide a solution to the 
A2A mapping schema problem for the case of a big input. 

A simple solution is to use FFD or BFD bin-packing algorithm to place the small inputs to 
bins of size q — Wi. Now, we consider each of the bins as a single input of size q — Wi. Let x bins 
are used. We assign each of the x bins to one reducer with a copy of the big input. Further, we 
assign the small inputs to bins of size |, and consider each of such bins as a single input of 
size |. Now, we can assign each pair of bins (each of size |) to reducers. In this manner, each 
pair of inputs is assigned to reducers. 


The big input of size | < wt < In this case, we assume that the small inputs have at 
most I size. We use First-Fit Decre asin g (FFD) or Best-Fit Decreasing (BFD) bin-packing 
algorithm, the AU method (Section |5.3| l, and Algorithms 2, 3 (Section |^. We proceed as 
follows: 


(1) First assign the big input with the small inputs. 

(a) Use a bin-packing algorithm to place the small inputs to bins of size |. Now, we 
consider each of the bins as a single input of size |. 

(b) Consider that x bins are used. Assign each of the bins to one reducer with a copy of the 
big input. 

(2) Depending on the number of bins, we use the AU method, and Algorithms 2, 3 to assign 
each pair of the small inputs to reducers. 


An example is given in Figure 15 


where we place the small inputs to 9 bins of size | 


and 


assign each of the bins to one reducer with a copy of the big input. Further, we implement the 
AU method on 9 bins to assign each pair of the small inputs. 


The big input of size ^ < Wi < ^. In this case, we assume that the small inputs have at 
most I size. We use a bin-packing algorithm and Algorithms (Sections]^. We proceed as 
follows: 

(1) First assign the big input with the small inputs. 

(a) Use a bin-packing algorithm to place the small inputs to bins of size |. 
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(b) Consider that x bins are used. Assign each of the bins to one reducer with a copy of the 
big input. 

(2) Depending on the number of bins, we use Algorithmj^ to assign each pair of small inputs. 


The big input of size ^ < q. In this case, we assume that the small inputs have at 

most q — Wi size. In this case, we use a bin-packing algorithm and place the small inputs 
to bins of size q — wt. We then place each of the bins to one reducer with a copy of the big 
input. Note that, we have not assigned each pair of small inputs. In order to assig n ea ch 
pair of small inputs, we use the bin-packing-based approximation algorithm (Section |4.1| l or 
Algorithms 1-4 depending on size of the small inputs. 


Theorem 9.1 (Upper bounds from algorithm). For a list of m inputs where a big 
input, i, of size ^ < Wi < q and for a given reducer capacity q, q < s' < s, an input is replicated 
to at most m — 1 reducers for the A2A mapping schema problem, and the number of reducers 
and the communication cost are at most m — 1 + ^ and (m — l)g -I- respectively, where s' 
is the sum of all the input sizes except the size of the big input and s is the sum of all the input 
sizes. 


Proof. The big input i can share a reducer with inputs whose sum of the sizes is at most 
q — Wi. In order to assign the input i with all the remaining to — 1 small inputs, it is required 
to assign a sublist of to — 1 inputs whose sum of the sizes is at most q — Wi. If all the small 
inputs are of size almost q — Wi, then a reducer can hold the big input and one of the small 
inputs. Hence, the big input is required to be sent to at most to — 1 reducers that results in at 
most (to — l)q communication cost. 

Also, each pair of all the small inputs is assigned to reducers (by first placing them to 
bins of size | using FFD or BFD bin-packing algorithm). The assignment of all the small 


inputs results in at most < 
(Theorem 


^ reducers and at most 


< — communication cost 


4.5 1 . Thus, the number of reducers are at most to — 1 -I- ^ and the communication 


cost is at most (to — l)q + 


□ 


Approximation factor. The optimal communication cos t (fro m Theorem 4.11 is s'^/q and the 
communication cost of the algorithm (from Theorem 9.1 1 is (to — l)g -I- As'^jq. Thus, the 
ratio between the optimal communication and the communication of our mapping schema 
is approximately 


10. AN APPROXIMATION ALGORITHM FOR THE X2Y MAPPING SCHEMA PROBLEM 

We propose an approximation algorithm for the X2Y mapping schema problem that is based 

on bin-packing algorithms. Two lists, A of to inputs and F of n inputs, are given. We assume 

that the sum of input sizes of the lists X, denoted by sum^, and Y, denoted by sumy, is greater 

than q. We analyze the algorithm on criteria (number of redu cers a nd the comm unicat ion cost) 

given in Section |4l We look at the lower bounds in Theorem|10.1l and Theorem |10.2|gives an 

. .. 


upper bound from the algorithm. The bounds are given in Table [ 


Theorem 10.l. (Lower bounds on the communication cost and number of 
REDUCERS) For a list X of m inputs, a list Y of n inputs, and a given reducer capacity q, 
the communication cost and the number of reducers, for the X2Y mapping schema problem, 

are at least - - - - and - ^ - -, respectively, where q is the reducer capacity, sumx is 

the sum of input sizes of the list X, and sumy is the sum of input sizes of the list Y. 

Proof. Since an input i of the list X and an input j of the list Y are replicated to at 
least 22^ and reducers, respectively, the communication cost for the inputs i and j 
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are x and wj x , respectively. Hence, the communication cost will be at least 

E m sumy . sum^ 2-sumx-sumy 

i—1 q ' 2-^j — l ^3 q — q 

Since the number of bits to be assigned to reducers is at least and a reducer 

can hold inputs whose sum of the sizes is at most q, the number of reducers must be at least 

2-sumx-sumy 


Bin-packing-based approximation algorithm for the X2Y mapping schema problem. 

A solution to the X2Y mapping schema problem for different-sized inputs can be achieved 
using bin-packing algorithms. Let two lists A of m inputs and Y of n inputs are given. The 
algorithm will not work when a list holds an input of size Wi and the another list holds an 
input of size greater than q — Wi, because these inputs cannot be assigned to a single reducer 
in common. Let the size of the largest input, i, of the list X is wp, hence, all the inputs of the 
list Y have at most size q — w^. We place inputs of the list X to bins of size Wi, and let x bins 
are used to place m inputs. Also, we place inputs of the list Y to bins of size q — Wi, and let 
y bins are used to place n inputs. Now, we consider each of the bins as a single input, and a 
solution to the X2Y mapping schema problem is obtained by assigning each of the x bins with 
each of the y bins to reducers. In this manner, we require x ■ y reducers. 


Theorem 10.2 (Upper bounds from the algorithm). For a bin size b, a given 
reducer capacity q = 2b, and with each input of lists X and Y being of size at most b, the 
number of reducers and the communication cost, for the X2Y mapping schema problem, are at 

most -p- and at most -1 respectively, where sumx is the sum of input sizes 

of the list X, and sumy is the sum of input sizes of the list Y. 


Proof. A bin i can hold inputs whose sum of the sizes is at most b. Hence, it is required 
to divide inputs of the lists X and Y into at least and bins, respectively. Since 
the FFD or BFD bin-packing algorithm ensures that all the bins (except only one bin) are 
at least half-full, each bin of size b has at least inputs whose sum of the sizes is at least |. 
Thus, all the inputs of the lists X and Y can be placed in at most and bins of size 


b, respectively. 
Let ™ 


(= 


) and y ( 


-) bins are used to place inputs of the lists X and Y, 


b ' y b 

respectively. Since each bin is considered as a single input, we can assign each of the x bins 


with each of the y bins at reducers, and hence, we require at most 


4-sumx-sum^ 

w 


reducers. 


Since each bin that is containing inputs of the list X (resp. Y) is replicated to at most 


z-sum.. / 1 J 

—(resp. at most 
(resp. Y) is at most 

E 2-sumTj , 

X —^ 


2-sum, 

2-sum^ 


E 


3 reducers, the replication of individual inputs of the list X 
(resp. at most ) and the communication cost is at most 

□ 


l<j<n 


Wi X 


2-suma 


_ 4-sumx-sumy 

~ b • 


Approximation factor. The optimal communication is . Thus, the ratio between the 

optimal communication and the communication of our mapping schema is 

11. CONCLUSION 

Two new important practical aspects in the context of MapReduce, namely different-sized 
inputs and the reducer capacity, are introduced for the first time. The capacity of a reducer 
is defined in terms of the reducer’s memory size. We note that processing time is typically 
proportional to the memory capacity. All reducers have an identical capacity, and any reducer 
cannot hold inputs whose input sizes are more than the reducer capacity. We demonstrated 
the importance of the capacity aspect by considering two common mapping schema problems 
of MapReduce, A2A mapping schema problem - every two inputs are required to be assigned 
to at least one common reducer - X2Y mapping schema problem - every two inputs, the first 
input from a list X and the second input from a list F - is required to be assigned to at least 
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one common reducer. Unfortunately, it turned out that finding solutions to the A2A and the 
X2Y mapping schema problems that use the minimum number of reducers is not possible in 
polynomial time. On the positive side, we present near optimal approximation algorithms for 
the A2A and the X2Y mapping schema problems. 

Mapping schemes for the case of reducers with different capacities are left for future 
research. Nevertheless, there exist a reduction to our proposed algorithms that may yield 
a reasonable performance in some cases. In particular, we can consider a common divisor of 
all the non-identical reducer capacity as a unit-sized reducer capacity. Then, we can follow 
our proposed algorithms to solve problems while regarding non-identical reducer capacities. 


A. PROOFS OF THEOREMS 1, 2, AND LEMMA 1 


Theorem 1 The problem of finding whether a mapping schema ofm inputs of different input 
sizes exists, where every two inputs are assigned to at least one of z > 3 identical-capacity 
reducers, is NP-hard. 


Proof. The proof is by a reduction from the partition problem (Garey and Johnson 1979 | 
that is a known NP-complete problem. The partition problem is defined as follows: given a set 
I = {* 1 , 12 , • ■ ■, im} of m positive integer numbers, it is required to find two disjoint subsets. 
Si C I and S2 C I, so that the sum of numbers in is equal to the sum of numbers in S2, 

Sir\S2 = 0 , and SiVJS2 = I. 

We are given m inputs whose input size list is W = {wi,W2, • ■ •, Wm}, and the sum of the 
sizes is s = We add z — 3 additional inputs, 011,0*2, ■ ■ • ,aiz-s, each of size |. We 

call these new z — 3 (a*i, 0*2,..., aiz-z) inputs the medium inputs. In addition, we add one 
more additional input, ai', of size that we call the big input. Further, we assume that 


the reducer capacity is . 

The proof proceeds in two steps: (i) we prove that in case the m original inputs can be 
partitioned, then all the inputs can be assigned to the 2 ; reducers such that every two inputs 
are assigned to at least one reducer, (ii) we prove that in case the mapping schema for all the 
inputs over the z reducers is successful, then there are two disjoint subsets Si and S 2 of the 
TO original inputs that satisfy the partition requirements. We can assume that if the sum is 
not divisible by 2, then the answer to the partition problem is surely “no,” so the reduction of 
the partition problem to the A2A mapping schema problem is trivial. 


Wi,W2, . . . , lOn 

ail 5 ^^2 5 • • •} ^iz —3 

ail 

ai' 



ai2 

ai' 


aiz-Z 

ai' 

Subset 1 of W 

ai' 

Subset 2 of W 

ai' 


Fig. 16: Proof of NP-hardness of the A2A mapping schema problem for z > 2 identical-capacity 


reducers. Theorem 3.1 


We first show that if there are two disjoint subsets Si and S 2 of equal size of the to original 
inputs, then there must exist a solution to the A2A mapping schema problem. Recall that any 
of the reducers can hold a set of inputs whose sum of the sizes is at most and the sum 

of the sizes of the new z — 3 medium inputs is exactly Hence, all the to original inputs 
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(ii, ^ 2 ,..., im) and a list of the z — 3 medium inputs can be assigned to a single reducer (out 
of the z reducers), and this assignment uses s + capacity, which is exactly the capacity 

of any reducer. Further, the big input, ai' , of size can share the same reducer with only 

one medium input aii (it could also share with original inputs). Thus, the big input, ai' , and 
all the medium inputs are assigned to z — 3 reducers (out of the remaining z — 1 reducers). 
In addition, the remaining two reducers can be used for the following assignment: the first 
reducer is assigned the set Fi and the big input, ai', and the second reducer is assigned the 
set S 2 and the big input, ai' . The above assignment is a solution to the A2A mapping schema 
problem for the given m original inputs, the z — 3 medium inputs, and the big input using z 
reducers, see Figure [T^ 

Now, we show that a solution to the A2A mapping schema problem — for all the inputs 
over the z reducers — results in a partition of the m original inputs into two equal-sized 
blocks. We also show that in a solution to the A2A mapping schema problem, each of the m 
original inputs and every medium input, aii, are assigned to exactly two reducers, and the 
big input, ai' , is assigned to exactly z — 1 reducers. Recall that the total sum of the sizes is 


( 2 - 3 )s {z-2)s _ (2z-3)s 

^ ' 2 ‘ 2 2 ' 

Due to the reducer capacity of a single reducer, all the inputs cannot be assigned to a single 
reducer; only a subset of the inputs, whose sum of the sizes is at most can be assigned 

to one reducer. Thus, each input is assigned to at least two reducers in order to be coupled 
with all the other inputs. 

Moreover, the big input, ai', can share the same single reducer with only a sublist. S', 
whose sum of the sizes is at most |. Hence, the big input, ai', is required to be assigned 
to at least z — 3 reducers in order to be paired with the medium inputs aii. Furthermore, 
the big input, ai', can share the same reducer with a sublist of the m original inputs whose 
sum of the sizes is at most |. This fact means that the big input, ai', must be assigned to 
two more reducers. On the other hand, all the medium inputs can share the same reducer 
with the original m inputs. Thus, here, the total reducer capacity occupied by all the inputs is 
2 X T,i<i<rnWi + 2 X (^~3)« + (z—1) x — 2s -|- (z — 3)s -l- — (z-i)zs ^ which is exactly 

the total capacity of all the z reducers. Thus, each of the m original inputs and each medium 
input ail cannot be assigned more than twice, and hence, each is assigned exactly twice. In 
addition, the big input, ai', is assigned to exactly z — 1 reducers. This fact also shows that 
all the reducers are entirely filled with distinct inputs. Thus, a solution to the A2A mapping 
schema problem yields partitions of the m original inputs to Si and S2 blocks, where the sum 
of the input sizes of any block is exactly |. Therefore, if there is a polynomial-time algorithm 
to construct the mapping schema, where every input is required to be paired with every other 
input, then the mapping schema finds the partitions of the m original inputs in polynomial 
time. □ 


Theorem 2 The problem of finding whether a mapping schema ofm and n inputs of different 
input sizes that belongs to list X and list Y, respectively, exists, where every two inputs, the 
first from X and the second from Y, are assigned to at least one of z > 2 identical-capacity 
reducers, is NP-hard. 


Proof. The proof is by a reduction from the partition problem [ Garey and Johnson 1979^ 
that is a known NP-complete problem. We are given a list of inputs I = {ii,i 2 , ■ ■ ■, im} whose 
input size list is W = {wi,W 2 , ■ ■ ■, Wm}, and the sum of the sizes is s = Yi<i<rnWi. We add z — 2 


additional inputs, aii,ai 2 , • • •, aiz- 2 , each of size |. We call these new z — 2 {aii,ai 2 , 


, ai 


'.-2) 


inputs the big inputs. In addition, we add one more additional input, ai', of size 1 that we call 
the small input. Further, we assume that the reducer capacity is 1 -I- §. Now, the list I holds 
TO -I- z — 1 inputs. 

For the X2Y mapping schema problem, we consider to original inputs and the z — 2 big 
inputs as a list X, and the small input as a list Y. A solution to the X2Y mapping schema 
problem assigns each of the to original inputs and each big input (of the list X) with the small 
input of the list Y. 
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Y 

ail 

Y 

ai2 


Y 

(liz—2 

Y 

Subset 1 of W 

Y 

Subset 2 of X 


Fig. 17: Proof of NP-hardness of the Z2Y mapping schema problem for z > 1 identical-capacity 


reducers, Theorem 3.2 


The proof proceeds in two steps: (i) we prove that in case the m original inputs can be 
partitioned, then all the m original inputs, the z — 2 big inputs, and the small input can be 
assigned to the z reducers such that they satisfy the X2Y mapping schema problem, (ii) in 
case the X2Y mapping schema problem is successful, then there are two disjoint subsets. Si 
and S 2 , of the m original inputs that satisfy the partition requirements. 

We first show that if there are two disjoint subsets Si and S 2 of equal size of the m original 
inputs, then there must exist a solution to the X2Y mapping schema problem. Recall that any 
of the reducers can hold a set of inputs whose sum of sizes is at most 1 + |, and the sum of the 
sizes of the new z — 2 big inputs is exactly |. Hence, the small input, ai', of size 1 and each big 
input, aii, can be assigned to z — 2 reducers (out of the z reducers), and this assignment uses 
1 + 1 capacity, which is exactly the capacity of any reducer. In addition, the remaining two 
reducers can be used for the following assignment: the first remaining reducer is assigned the 
set Si and the small input, ai', and the second remaining reducer is assigned the remaining 
original inputs, S 2 , and the small input, ai'. The above assignment is a solution to the X2Y 
mapping schema problem (for the given m + z — 2 inputs of the list X and the one input of the 
list Y using z reducers, see Figure [T7]l. 

Now, we prove the second claim that a solution to theX2Y mapping schema problem results 
in a partition of the m original inputs into two equal-sized blocks. Recall that the total sum 


of the sizes is s + 


{z—2)s 


1 = 


+ 1 . 


Due to the reducer capacity 0 ^ a single reducer, all the inputs cannot be assigned to a single 
reducer; only a sublist of the inputs, whose sum of the sizes is at most 1 +1, can be assigned to 
a single reducer. We show that the small input, ai', must be assigned to all the z reducers. The 
small input, ai', of size one can share the same single reducer with only a subset. S', whose 
sum of the sizes is at most §. Hence, the small input, ai', is required to be assigned to z — 2 
reducers (out of z reducers) in order to be paired with all the big inputs aii. and the remaining 
two reducers in order to be paired with all the m original inputs. This fact results in that a 
solution to the X2Y mapping schema problem yields partitions of the m original inputs to Si 
and S 2 blocks, where the sum of the input sizes of any block is exactly |. Therefore, if there 
is a polynomial-time algorithm to construct the mapping schema, where every input of one 
list is required to be paired with every other input of another list, then the mapping schema 
finds the partitions of the m original inputs in polynomial time. □ 


Lemma 1 Let q be the reducer capacity. Let the size of an input is \ . Each pair ofu = 2', 

i > 0, inputs can be assigned to 2* — 1 teams of2'~^ reducers in each team. 

Proof. The proof is by induction on i. 

Basis case. For i = 1, we ha ve u = 2 inputs, and we can assign them to a team of one reducer 
of capacity q. Hence, Lemma [6^ holds for (i = 1) two inputs. 

Inductive step. Assume that the inductive hypothesis — there is a solution for u = 2*“^ 
inputs, where all-pairs of m = 2®“^ inputs are assigned to 2®“^ — 1 teams of 2®“^ reducers in 
each team and have the team property (each team has one occurrence of each input, which we 
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will prove in algorithm correctness) — is true. Now, we can build a solution for m = 2* inputs, 
as follows: 

(a) Divide m = 2* inputs into two groups of 2®“^ inputs in each group, 

(b) Recursively create teams for each of the two groups, 

(c) Create some of the teams for the 2® inputs by combining the team from the first group 
with the team from the second group. Since by the inductive hypothesis we have a 
solution for u = 2*“^ inputs, we can assign inputs of these two groups to 2 • (2*“^ — 1) teams 
of 2*“^ reducers in each team. And, by combining where j = 1 , 2 ,..., (2®“^ — 1), teams 
of each group, there are 2*“^ — 1 teams of 2*“^ reducers in each team; see Teams 5-7 for 8 
inputs in Figure 

(d) Create 2*“^ additional teams that pair the inputs from the first group with inputs from the 
second group. In each team, the input from the first group is assigned to the reducer. 
In the first team, the input from the second group is also assigned to the reducer. In 
subsequent teams, the assignments from the second group rotate, so in the team, the 

input from the second group is assigned to reducer k + j — (2*“^ — see 

Teams 1-4 for 8 inputs in Figure!^ 

By steps (c) and (d), there are total2®“^ — 1 + 2*“^ = 2* — 1 teams of 2*“^ reducers in each 
team, and these teams holds each pair of the m = 2® inputs. □ 


B. PSEUDOCODE AND CORRECTNESS OF ALGORITHM 1A 


Algorithm description. First, we divide m inputs (that are actually bins of size |, fc > 3, 
after placing all the given m inputs to to' bins, each of size |) into two sets A of ?/ inputs and 
B oi X inputs. Then, we make u = \disjoint groups of y inputs of the set A such that 
each group holds inputs, lines^^ (Now, each of the groups is considered as a single input 


that we call the derived input.) We no not show the addition of dummy inputs and assume 
that M is a power of 2. Function 2_step_odd_q{lower, upper) recursively divides the derived 
inputs into two halves, lineffl Function Assignment {lower, mid, upper) (line[^ pairs every two 
derived inputs and assigns them to the respective reducers (li ne|ll| ). Each reducer of the last 
team is assigned using function Last_Team{groupAW), lines [T^| 16 [ 

Note that functions 2_step_odd_q{lower, upper), Assignment {lower, mid, upper), and 
value_h{lower, t, mid, upper) take two common parameters, namely lower and upper where 
lower is the first derived input and upper is the last derived input (i.e., group) at the 
time of the first call to functions, line Once all-pairs of the derived inputs are assigned to 
reducers, line |ll[ function Assign _input_f rom_B{Tearn []) assigns i*^ input of the set B to all 
the [reducers of i*^ team, lines ITj ^ After that. Algorithm [T|\ is invoked over inputs of 
the set B to assign each pair of the remaining inputs of the set fHo reducers until every pair 
to the remaining inputs is assigned to reducers. 

The algorithm correctness proves that every pair of inputs is assigned to reducers. 
Specifically, we prove that all those pairs of in puts, {i,j) and {i',j'), of the set A are assigned 
to a team whose i ^ i' and j ^ j ' (Cl aim |B.1[ I. Then that all the inputs of the set A appear 
exactly once in eac h te am (C)laim [R^ . We then prove t hat th e set B holds x < y — 1 inputs, 
when q = 3 (Claim [R^ . At last we conclude in Theorem |B.4| that Algorithmic assigns each 
pair of inputs to reducers. 

Note that we are proving all the above mentioned claims for q = 3; the cases for q > 3 can 
be generalized trivially where we make “ - \q-h/2]] derived inputs from y inputs of the set 
A (and assign in a manner that all the inputs of the A are paired with all the remaining to — 1 
inputs). 


Claim B.l. Pairs of inputs {i,j) and {i',j'), where i = i' or j = j', of the set A are assigned 
to different teams. 

Proof. First, consider i = i' and j j', where {i,j) and («',/) must be assigned to two 
different teams. If j f , then both the j values may have an identical value of lower and 
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Algorithm 1: Part A 


Inputs: to: the number of bins obtained after placing all the given to' inputs (of size < 
fc > 3 is an odd number) to bins each of size 
q: the reducer capacity. 

Variables: 

A: A set A, where the total inputs in the set Ais y = [|J (+ 1) 

B: A set B, where the total inputs in the B is x = m — y 

Team[i,j] : represents teams of reducers, where index i indicates team and index j 
indicates reducer in team. Consider u = [■ There are u — 1 teams of r = 
reducers in each team. 

groupA[] : represents disjoint groups of inputs of the set A, where groupA[i] indicates 
group of [ 2^] inputs of the set A. 

1 Function create_group{y) begin 

2 for i ^ 1 to M do group A[i] (z,i + l...,i + — 1), i z + 

3 2_step_odd_q{l,u), Last_Team(groupA[]), Assign_input_from_B(TeamW) 

4 Function 2_step_odd_q{lower, upper) begin 

5 if ^ ^ retum 

6 else 

7 mid ^ [ upper-lower -^ ^ Assignment{lower, mid, upper) 

8 2_step_odd_q{lower, mid), 2_step_odd_q[mid + 1, upper) 


9 

10 

11 


Function Assignment {lower, mid, upper) begin 
while mid > 1 do 

foreach (a, t) G [lower, lower + mid — 1] x [0, mid — 1] do 
Team[{u-2-mid+l)+t,a - ^ 

{groupA[a], groupA[value_b{a, t, mid, upper)]) 


12 Function value_b{a, t, mid, upper) begin 

13 if a + t + mid < upper + 1 then return (a + t + mid) 

14 else if a + t + mid > upper then return (a + 1) 

15 Function Last_Team{lower, mid, upper) begin 

16 |_ foreach z S [1, u] do Team[u — 1, z] ^ groupA\p. x i — 1], groupA[2 x z] 

17 Function Assign_input_from_B{Team[]) begin 

18 |_ foreach {i,j) G [l,zz— 1] x [l,r] do Team[i,j] g- B[i] 


mid but they must have two different values of t (see lines |13| |14| of Algorithm [^), where 
j = lower +1 + mid or j — lower + t. Thus, for two different values of j , we use two different 
values of t, say ti and t 2 , that results in an assignment of {i,j) and (z',j') to two different 
teams ti and t 2 , (note that teams are also selected based on the value oft, {y — 2- mid + 1) + f, 
see line [TT| of Algorithm [^, where for g = 3, we have u = y). Suppose now that i ^ i' and 
j = /, w^re (z, j) and (z , j') must be assigned to two different teams. In this case, we also 
have two different values of t, and hence, two different t values assign (z, j) and {i',j') to two 
different teams {{y -2 - mid + 1) +t, line|ll|of Algorithm!^). 

Hence, it is clear that pairs (z,j) and (F( j'), where z ^ z' and j ^ j', are assigned to a 
team. □ 

Claim B.2. All the inputs of the set A appear exactly once in each team. 

Proof. There are the same number of pairs of inputs of the set A and the number of 
reducers ((y — 1) [|]) that can provide a solution to the A2A mapping schema problem for the 
y inputs of the set A. Recall that (y — 1 ) [reducers are arranged in the form of (y — 1) teams 
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of [|] reducers in each team, when g = 3. Note that if there is a input pair (i,j) in team t, 
then the team t cannot hold any pair that has either i or j in the remaining [f ] — 1 reducers. 
For the given y inputs of the set A, there are at most [|] disjoint pairs (ii,ji), { 12 , 32 ), ■ ■ ■, 
{i\v/ 2 ^,j\v/ 2 ^) such that h ^ i 2 ^ ^ 7 ^ ji jz 7 ^ • ■ • ¥= jly/ 2 ]- Hence, all y inputs of the 

set A are assigned to a team, where no input is assigned twice in a team. □ 

Claim B.3. When the reducer capacity g = 3, the set B holds at most x <y — 1 inputs. 

Proof. Since a pair of inputs of the set A requires at most g — 1 capacity of a reducer and 
each team holds all the inputs of the set A, an input from the set B can be assigned to all the 
reducers of the team. In this manner, all the inputs of the set A are also paired with an input 
of the set B. Since there are y — 1 teams and each team is assigned an input of the set B, the 
set B can hold at most x < y — 1 inputs, n 

Theorem B.4. Algorithm^^ assigns each pair of the given inputs to at least one reducer 
in common. 

Proof. We have (y — 1) [ |] pairs of inputs of the set A of size g — 1, and there are the same 
number of reducers; hence, each reducer can hold one input pair. Further, the remaining 
capacity of all the reducers of each team can be used to assign an input of B. Hence, all 
the inp uts of A are paired with every other input and every input of B (as we proved in 
Claims [R^ and [R^ . Following the fact that the inputs of the set A are paired with all the m 
inputs, the inputs of the set B is also paired by following a similar procedure on them. Thus, 
Algorithmic assigns each pair of the given m inputs to at least one reducer in common. □ 


C. PSEUDOCODE AND CORRECTNESS OF ALGORITHM IB 


Algorithm 1: Part B 

Inputs: to: the number of bins obtained after placing all the given to' inputs (of size < |, 
fc > 4 is an even number) to bins each of size |, 
g: the reducer capacity. 

Variables: 

Team[i,j] : represents teams of reducers, where index i indicates team and index j 
indicates reducer in team. Consider u = ^. There are u — 1 teams of [reducers 
in each team. 

groupAW : represents disjoint groups of inputs of the set A, where groupA[i] indicates z*'* 
group of [|] inputs of the set A. 

1 Function create_group{m) begin 

2 for z •(— 1 to zz do group A[i] •(— (z,z-|-l...,z-(-| — l),z-<—z-|-| 

3 2_step_even_q{l, u), Last_Team{l, [, u) 

4 Function 2_step_even_q{lower, upper) begin 

5 if -lower ^ ^ ^ retum 

6 else 

7 mid ^ Assignment{lower, mid, upper) 

8 2_step_even_q{lower, mid), 2_step_even_q{mid + 1, upper) 


We show that every pair of inputs is assigned to reducers. Specifically, Algorithm HP 
satisfies two claims, as follows: 
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Claim C.l. 
a team. 


Pairs of derived inputs {i,j) and where i i' or j j', are assigned to 


Claim C.2. All the given m inputs appear exactly once in each team. 
We do not prove Claims 


C.l and C.2 Note that Claim C.l follows Claims IB.l 


where 
e pairs 
f are 


Claims B. 1 shows that all the pairs of inputs of the set A (in case <7 = 3) and all t 
of derived inputs of the set A ( in ca se g > 3) (i,j ) an d whe re i i' or j 

assigned to a team. Also, Claim |C.2| follows Claim B.2 where Claim [R^ shows that all the 
inputs of the set A appear in eacn team only once, while in case of Algorithm the set A is 
considered as a set of m inputs. 


Theorem C.3. Algorithm^p assigns each pair of the given inputs to at least one reducer 
in common. 

Proof. Since there are the same number of pairs of the derived inputs and the number 
of reducers, it is possible to assign one pair to each reducer that results in all-pairs of the m 
inputs. □ 


D. CORRECTNESS OF ALGORITHM 2 

The correctness shows that all-pairs of inputs are assigned to reducers. Specifically, we show 
that each pair of inputs of th e s et A is assigned to pfp + 1) reducers that use only p capacity 


of each reducer (Claims D.l and D.2|l. Then, we prove that the set B holds x <m — p^ inputs. 


At last we conclude that Algorithm 2 assigns each pair of inputs to reducers. 


Claim D.l. All the inputs of the set A are assigned to p{p+l) reducers, and the assignment 
of the inputs of the set A uses only p capacity of each reducer. 


Claim D.2. All the inputs of the set A appear in each team exactly once. 


proving Claims |D.1| and D.2 here. Claims [p.l| and [D.2| 
;/ioc?; hence, all the inputs of the set A are placed to p + ] 


follow the correctness of 
1 teams of p bins (each of 


We are not 
the AU method 

size q) in each team, and the assignment of each such bin only uses p capacit y of e ach reducer. 
Further two bins cannot be assigned to a reducer because 2 x p > q. Claim |D.2| also follows 
the correctness of the AU method, and hence, all the inputs of the set A appear only once in 
each team. 


Claim D.3. When the reducer capacity is q, the set B holds x < m — p'^ inputs, where p is 
the nearest prime number to q. 

Proof. There are p + 1 teams of p reducers in each team, and inputs of the set A use 
q — p capacity of each of the reducers. Hence, each reducer can hold q — p additional u nit-s ized 
(almost identical-sized) inputs. Since inputs of the set A appear in each team (Clai m |D.2| I, an 
assignment of q — p additional unit-sized inputs to all the reducers of a team provides pairs 
of all the inputs of the set A with additional inputs. In this manner, p + 1 teams, which hold 
inputs of the set A, can hold at most {p + 1) x {q — p) additional inputs. Since p^ < m < 
p^ + {p-\-\) X {q — p), the set B can hold x <m— p^ inputs. □ 


Theorem D.4. Algorithm 2 assigns each pair of inputs to reducers. 

We are not proving Theorem |D.4| here. The proof of Theorem D.4 considers the fact that all 
the inputs of the set A are paired with each other using the AU method, and they are also 
paired with all the remaining inputs of the set B. Furth er, in p uts o f the set B will be paired 
with each other by using Algorithmj^ or[^ (Theorems |B.4|or|C.3l. 
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